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Abstract 

Diffusion tensor imaging (DTI) is widely used to characterize, in vivo, the white matter of the central 
nerve system (CNS). This biological tissue contains much anatomic, structural and orientational informa¬ 
tion of fibers in human brain. Spectral data from the displacement distribution of water molecules located 
in the brain tissue are collected by a magnetic resonance scanner and acquired in the Fourier domain. 

After the Fourier inversion, the noise distribution is Gaussian in both real and imaginary parts and, as a 
consequence, the recorded magnitude data are corrupted by Rician noise. 

Statistical estimation of diffusion leads a non-linear regression problem. In this paper, we present a 
fast computational method for Maximum Likelihood estimation (MLE) of diffusivities under the Rician 
noise model, based on the Expectation Maximization (EM) algorithm. By using data augmentation, we 
are able to transform a non-linear regression problem into the Generalized Linear Modeling (GLM) frame¬ 
work, reducing dramatically the computational cost. The Fisher-scoring method is used for achieving fast 
convergence of the tensor parameter. The new method is implemented and applied using both synthetic 
and real data in a wide range of fo-amplitudes up to 14000 s/mm 2 . Higher accuracy and precision of the 
Rician estimates are achieved compared with other log-normal based methods. In addition, we extend the 
ML framework to the maximum a posterior (MAP) estimation in DTI under the aforementioned scheme 
by specifying the priors. We will describe how close numerically are the estimators of model parameters 
obtained through ML and MAP estimation. 

Keywords data augmentation, Fisher scoring, maximum likelihood estimator, maximum a posterior esti¬ 
mator, Rician Likelihood, reduced computation 


1 Introduction 

Diffusion tensor imaging (DTI) is a powerful tool to detect, in vivo, the white matter anatomy and struc¬ 
tures of the brain. The raw MR-data are collected by a magnetic resonance scanner and consist of spectral 
measurement from the displacement distribution of water molecules constrained into cellular structures. 
Diffusion anisotropy characterizes the nervous fibers. 
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After the Fourier inversion, the MR-signals are corrupted by a complex Gaussian noise, and conse¬ 
quently, the recorded measurement magnitudes, referred as diffusion weighted magnetic resonance imag¬ 
ing (DW-MRI) data, will follow the Rician distribution. The noise distribution, however, will still stay 
Gaussian in both real and imaginary components. The simplest method for diffusion tensor estimation 
(DTE) is based on the linearized log-normal regression model, where the residual variance is assumed to 
be either constant (Least Squares) or depending on the signal amplitude (Weighted Least Squares). These 
Gaussian noise models fail to fit the high frequency data, which carry information about the higher order 
diffusion characteristics. In the existing literature Rajan, J. et al. < 2011} ; Veraart, J. et al. < 2011| |; Andersson 
J.L.R. (20081 on the ML-estimation of diffusion tensors under the Rician noise, the maximization algorithm 
involves repeated computation of modified Bessel functions. By using data augmentation we are able to 
replace the Rician likelihood by a Poisson likelihood which is standard in the framework of GLM. 

Such simplification reduces dramatically the computational burden of the Fisher-scoring maximization 
algorithm. This applies also at high 6-amplitudes, where in the low signal regime measurements below a 
threshold are customarily coded as zeros. In the standard LS or WLS approaches, zero-measurements are 
problematic since they cannot be fitted by a log-normal distribution, and simply discarding them induces 
selection bias. The appropriately modeled noise level provides capability of data correction in further 
insights, e.g. removing artefacts from the raw data. 

This paper is structured as follows. Section [2] describes data augmentation and specifies the statistical 
model for DTE. In Section [3] we discuss the implementation of the EM and the Fisher-scoring algorithms 
in the DTI context. In addition, we also specify priors for the parameters and discuss the computation of 
the Maximum a Posteriori Estimator (MAPE) under the same scheme. Section|4]illustrates the results from 
both synthetic and real data. In Section[5]we conclude with an overview of the methods and the undergoing 
developments. 


2 GLM for MRI observations 


2.1 Rician noise in MRI 


In magnetic resonance imaging (MRI), we usually need to take the noise in the raw MR-acquisitions into 
account. The complex valued noise e is composed of two i.i.d. Gaussian random variables with zero mean 
and variance <r 2 , one for the real and the other one for the imaginary component. After the Fourier inver¬ 
sion, the signal intensity S > 0 is corrupted by the the complex Gaussian noise, and Y = .S' + e\ will be 
observed. 

Consequently, the observed MR-signal magnitudes follow a Rician distribution resulting in the likeli¬ 
hood function 
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where I a is the a-order modified Bessel function of first kind. For a = 0 it has also the following represen¬ 
tation in terms of Gaussian hypergeometric series Gradshteyn, I.S., Ryzhik, I.M. (2007): 
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Let t = S 2 /(2a 2 ), then Eq. 12.1 1 gives 


>-{Y e dy) = exp ^—t — ^-^jl 0 ^y/2tjdy 


(2.3) 


with r = yS/ (2a 2 ) = V2ty/(2a). 


2.2 Data augmentation 


We follow the strategy presented in Gasbarra D. et al. ( 2014) introducing an augmented data N from a 
Poisson distribution with mean t > 0. The likelihood of the observed data can be transformed from the 
Rician likelihood Eq. (|2.3) to a joint augmented density 


Pt,<r*(N = n,Y 2 G dy 2 ) = P t)<7 2 (TV = n,X £ dx) 
= P t (N = n)P a 2 (X <E dx\N = n) = 
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where X is from the conditional distribution Gamma(iV + 1, 1/(2 <t 2 )) given N. Eq. (2.41 provides a trans¬ 
formation from a non-linear regression problem to the GLM framework 


ftA z ) = c(z,0)exp 
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with 2 corresponding to the response in general, see McCullagh, R, Nelder, J.A. (1989) for more details. 


3 Method 

3.1 DW-MRI and parametrization 

In DW-MRI, the signal is modeled as the first equality 

S( q) = Sq exp(— 6d(g)) = S 0 ex.p(Ze), 

where the control vector q £ R 3 is determined by the sequence of gradient pulses, b = |q| 2 , and g = 
q/|q| G S 2 is a vector of unit length. The MR-signal decays exponentially with respect to the 6-amplitude. 
Depending on the gradient direction g the decay is modeled by the reflection symmetric diffusivity function 

d:S 2 -> R+. 

Great efforts have been devoted to modeling the diffusivity, and in general we can have parametrization 
as the second equality. In the simplest model the diffusivity is expressed by a symmetric and positive 
definite rank-2 tensor D € K 3x3 , giving 

logS(q) = log S 0 - bg T Dg = log So + Z6 , 

where in the left hand side the diffusion tensor is parametrized as 

d (^1 , . . . , 60 ) • (Dxx ; Pyy: Pzz : Pxy' Pxz ; Pyz ) 

with a design matrix 

Z = Z( q) = -6(g 2 , gy, g 2 z , 2g x g y , 2g x g z ,2g y g z ) . 
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i, the diffusivity is modeled 

with a totally symmetric cartesian tensor D of order n £ N, as 

3 3 3 

d(g) := E X!'" E D ^^2,-,e n 9h9e 2 ' ■■gtin ■ 

^1 = 1 t 2 —1 ^2r,=l 


In high angular resolution models (HARDI) (see e.g. Barmpoutis A. et al. (2009 


3.2 EM in MLE 


In the optimization of the likelihood, we employ the EM (Expectation - Maximization) algorithm, which 
is one among the iterative methods in the MLE or in the Maximum a Posterior Estimation (MAPE). The 
EM algorithm proceeds in two steps and shortens the computational complexity by using augmented data. 
In terms of our case, in the E-step we calculate the expectation of the log likelihood w.r.t the conditional 
distribution of N given by the observations and other parameters with fixed values. In the M-step, we find 
the ML parameter of Sq and a 2 by maximizing the augmented log likelihood quantities. The computational 
details are listed in Appendix [B] 

The log likelihood from Eq. ||2.4) is expressed as 


Y 2 


Q ■= log [p t , a 2 {N = n,Y)) = c(Y,N) + tVlog(t) - (N + 1) log(er 2 ) - t- 


(3.6) 


where c(Y, N) = N log(y 2 ) — 2 log(A!) — ( N + 1) log(2) does not depend on (t, a 2 ) which will be omitted in 
the M-step. From Section 3.1 we have t = Sq exp(2Z0)/2a 1 . 

In the EM-iteration, given the current parameter estimates {0 {k ' > , S 2 ^, c 2 ' l > ), we update the conditional 
expectation of the augmented data by 


(tV) (fc) := E m ^ k) (N\Y) = with 
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In the M-step we update a 2 and Sq by the recursions 
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where m is the number of acquisitions at each voxel. 

For the tensor parameter 6, we employ a stabilized Fisher scoring method: given the stabilizing param¬ 
eter a £ [0,1], we iterate the recursion 


-l 


9^9 + ^(1 — a)J{9) + aS(9) T S(9)j S(9), 

until convergence to a fixed point Lange K. (2013). In Eq. ( |3.9| the score S(0) is given by 

m m 

S(9) = 2E W) (fe) - (S { 0 k) /<J (k) ) 2 J2 c M^) z 7 , 

i =1 i =1 

and the corresponding Fisher information is 

m 
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The initials of the EM algorithm can be obtained through the least square (LS) from a truncated dataset 

i 2 in order to fit the Gaussian model (see Jones 


with the diffusion weighting ranging from 0 ~ lOOOs/r 


D.K., Basser P.J. 1(12004}, Barber, P.A. et al. (|1998}). To pursue higher quality of the initials, we could further 


apply the weighted least square (WLS) described in Zhu H. et al. < j2007[ ). In the Appendix |C| we compare 
the differences between our EM algorithm and the direct optimization of the Rician likelihood in Eq. 0, 
which is commonly used to compute the MLE in DTI. It should be noted that the EM algorithm is needed 
because of the latent augmented variables; it does not decrease the marginal likelihood of the data, see 
Appendix |A| for the proof. 


3.3 EM in MAPE 


In the Bayesian framework, the Maximum a Posterior Estimation (MAPE) aims to obtain the point estimates 
by maximizing the posterior density. The difference between MLE and MAPE in this scenario is in the prior 
probability 7 t(£). Given the data y, the normalizing constant in the posterior density 7r(£|y) does not depend 
on the parameter f\ We find the MAPE by maximizing the joint density (y), and this is achieved by 

iterating the EM-recursion with the penalization log 7 t(£) 

£ (fe+1) = argmax|E; € ( fc )(log^(z,y)|y) +log7r(£)j (3.10) 


until convergence to a fixed point. The log-prior penalization term has a regularizing effect, which vanishes 
asymptotically as the sample size grows Andersson J.L.R. < [2008 1. 

In DTE, we can assign conjugate priors in light of Section 3.2 for a 2 and Sq. Since we have little knowl¬ 
edge of the tensor parameter 9, we may choose non-informative priors which are either scale- or shift- 
invariant Jaynes E.T. (2002). A simple Bayesian hierarchical model is obtained after the following choices: 


• a 2 has scale invariant improper prior with density 7r(er 2 ) oc 1/a 2 , 


• Sq ~ Gamma(ci, C 2 ), where a, C 2 are very small. 

• 9 £ R d has the isotropic centered Gaussian prior _\ r (0. ID 1 ), where Q is a d x d precision matrix. 


The penalized EM-updates for MAPE are given by 

( a ( fe+ 1) ) 2 = Q^((5f) 2 ex p(2Z i 9^)+Y 2 )^ j ^(2(A i )W + 1) + l) (3.11) 

and 

(■ s o k+1) ) 2 = (l><> (fc) +ci) /(^yjrE( ex p( 2 ^ 0(fc) ) + c 2 ) ■ (3.12) 

Additionally, this gives the modified score and Fisher scoring 

S(0) = S(9 ) — Q9, and J = J(9) + 0, respectively. 

Under our Bayesian model with weak priors the MAP estimation Eq. ||3.11} and Eq. (J3.12J are similar 

m 

as the ML updates Eq. ( ]3.7| ) and Eq. ( |3.8| l. Indeed, usually (Ay) 1, and we can omit the difference 

i —1 

between Eq.| |3.7) and Eq. f3.ll} . Then when Ci and C 2 are small enough, the difference between the likelihood 
and posterior mode of Sq, expressed in Eq.f3.8[l and Eq. f3.12} respectively, can also be ignored. The only 
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difference when updating 6, is that we have considered the correction between the elements of a tensor 
represented by the prior distribution, the inverse covariance matrix, tl Such correction may be ignorable. 

Remark: Sometimes the MLE can be treated as a special case of the MAPE where the precision of the 
parameters depend on the chosen prior. If the effects of the priors are weak enough to be ignored, then the 
posterior distribution is asymptotically approximated by the likelihood. The consequence is that numeri¬ 
cally the MAP tend to the ML estimates numerically. Such remark is not unusual (see 
(20001) but nearly has never appeared in the DTI literature. 


Sparacino, G. et al. 


4 Results 


4.1 Synthetic Data 

Synthetic data sets were simulated by choosing a positive tensor of 2nd-order and of 4th-order with fixed 
So and the noise variance a 2 . The simulated data sets in the experiments arise from models with param¬ 
eter values resembling the real scenario. Every dataset contains 1440 measurements which were sampled 
from 32 distinct gradients and 15 distinct increasing b values (knots) up to 14000s/mm 2 , each repeated 
three times. The ground truth (GT) of high (H-) and low (L-) Rician noise, cr 2 , are 93,0405 and 12,8821, 
respectively. To compare the performance, we first plot the ML estimated Rician signals and the GT shown 
in Fig. [T] Figj2] shows the empirical signal to noise ratio (SNR) of the distinct b values from the first 480 
measurements as an illustration. For comparison of the methods, we simulate 100 datasets from high noise 
case under the 4th-order tensor and compare the first 480 measurements of the sample means of SNR and 
the GT under different methods in Figj3| where denotes that only the low frequencies (b values less 
than 1000 s/mm 2 ) are considered in the estimation. This figure reveals that our MLE and the WLS under 
a truncated dataset fit the GT well. However, when comparing the mean square errors (MSE) of the noise 
variance, the WLS* has a huge bias, 54.777, compared with the MSE of the ML estimates, 10.358. The fitted 
Rician signals are depicted in Fig. |4j where the estimated signals are retrieved from a low b and a high 
b -value cases estimated from the first 480 measurements. It reveals that in the low b-value case both the 
WLS* and the MLE perform well. But our ML estimates show advantages in the high b-value case. The 
reason is that for the WLS* we use data information which do not consider the high frequencies, but only 
fit back to fetch reliable estimates of signals. We further check the MSE on tensor coefficients from the 
15 distinct b values. The results are described in Figj5j where we can conclude that the MLE is the best 
one compared with other methods. The average computational time of the aforementioned MLE method 
under the 4th-order tensor model is 0.4868 seconds, which is extremely shorter than the minutes running 
time per voexl from the current standard methods such as MATLAB Nelder-Mead based or gradient-based 
estimators (see |Ghosh A. et at ( 2014) ; Landman B. et al. ( 2007) ). 


4.2 Real Data 

The data consist of 4596 diffusion MR-images of the brain of an healthy human volunteer, taken from four 
5mro-thick consecutive axial slices, and measured using a Philips Achieva 3.0 Tesla MR-scanner. The image 
resolution is 128 x 128 pixels of size 1.875 x 1.875 mm 2 . After masking out the skull and the ventricles, we 
remain with a region of interest (ROI) containing 18764 voxels. In the protocol, we used all the combinations 
of the 32 gradient directions with the 6-values varying in the range 0 — 14000s/mm 2 , with 2 — 3 repetitions, 
for a total of 23 323 644 data points. The average computational cost per voxel by our method the 4th- 
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Figure 1: Fitted Rician signals by the proposed MLE method. The blue curves depict the signal intensities of 
the GT. The red and green curves show the fitted signals under the 2nd-order tensor model from the high- 
and low- noise level datasets. Correspondingly, the black crossings and the cayn stars are the empirical 
values under the 4th-order tensor model. 



Figure 2: Empirical SNR as functions of b values. The GT of the upper curve is from the high SNR cor¬ 
responding the lower noise level with cr 2 = 12,8821. The bottom one has the high noise level with 
a 2 = 93, 0405. The red curves are fitted SNR under the 2nd-order tensor model. While the green stars 
represent the empirical SNR under the 4th-order tensor model. 

order tensor model from this dataset is 1.8331 seconds. We illustrate the results mainly under the 4th-order 
tensor model. Fig j6] shows the mean diffusivity (MD) and the fractional anisotropy (FA) of diffusion from 
two consecutive slices, where FA is computed from the results under 2nd-order tensor model, which is 
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Figure 3: Sample mean of SNR. The sample means are calculated from 100 simulated datasets. The SNR are 
estimated by different methods. The blue curve represents the GT. The cyan curve and the green stars are 
the estimates by the LS and the WLS with the truncated datasets, respectively. The red curve is the results 
through the WLS, and the black crossings are the empirical values by our MLE method. 


given by 


FA = 


V3((A! - E[ A]) 2 + (A 2 - U[A]) 2 + (A 3 - E{ A]) 2 ) 
\/2(Xj + A| + A§) 


(4.13) 


The average values of FA from these two ROI are 0.2769 mm 2 /s and 0.286l mm 2 /*', respectively. The color 
in FA represents the orientations of the fibers. Under the 4th-order tensor model, MD is expressed as 


MD = -(Him + -D1122 
+ 2 U 2233 ) = ^trace(D). 
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2D 


3333 


(4.14) 


The average values of MD from Slice 3 and 4 are 6.248e-03 mm 2 /s, 6.045e-03 mm 2 /s, respectively, and we 
have the same estimated values of MD under 2nd-order tensor model. 

We also plot the Rician noise map of a from the two consecutive slices shown in Fig. [7j where the 
artefacts are clearly depicted by white color representing very high noise, which reveals the true scenario 
from the raw MR images. 

Visualization of angular resolution of DTI data under different tensor models from the region of interest 
(ROI) of two consecutive slices are displayed in Fig. |8j where the ROI is near the hippocampus and the 
empty spaces inside of left parts of the diffusion profiles (DP) are the masked ventricle. DP depict under 
the 4th-order tensors providing much angular information of diffusion, where the colors represents the 
principle orientations of diffusion at each voxel. These tensor profiles are plotted by MATLAB fanDTasia 
toolbox Barmpoutis A. et al. |2007fr. 
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(a) low b value, b = 62 s/mm 2 



(b) high b value, b = 14000 s/mm 2 

Figure 4: Sample mean of the Rician signals from low and high b values. The plots illustrate the means of 
the signal intensities at b = 62 and 14000s/to?ti 2 , respectively, estimated by by different methods. 



Figure 5: MSE on tensor coefficients 
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(a) Mean diffusivity (MD) 
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slice 3 

slice 4 


(b) Fractional anisotropy (FA) 


Figure 6: MD and FA maps from two consecutive slices, where the estimated FA are computed under the 
2nd-order tensor model. The color in FA represents the orientations of the fibers. The color coded FA maps 
are drawn by using the software ExploreDTI Leemans A et al. (20091. The corresponding MD maps are 
from the results under the 4th-order tensor model, where the white spots corresponding to the corrupted 
data (artefacts) with measured magnitudes increasing at high 6-value. 



Figure 7: Rician noise map from two consecutive slices. The white curves in the left bottom of the slices 
depict the artefacts corresponding to very high noise. 

5 Discussion 

Our method substantially differs from the previous ones in the literature and the advantages are summa¬ 
rized by the following points: 1) We introduce a novel data augmentation, which allows the non-linear 
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Figure 8: Visualization of DTI data with 4th-order tensors from a ROI. The color-code represents the main 
direction of the principal eigenvalue of the 2nd-order tensor: Red, left-right; Green, anterior-posterior; Blue, 
superior-inferior. 


regression problem to be transformed into the GLM framework in DTE. 2) Subsequently, the computation 
is dramatically reduced due to the tractable modes of parameters of interest in the sense of point estimation. 
In addition, when employing Fisher-scoring scheme we simplify the complexity of the Fisher information. 
3) Our Rician noise model can be combined with any tensor model in different representation, such as 
spheric harmonic expansion, by reparametrization. 4) Either ML or MAP estimation yields more accurate 
estimates than the LS and WLS do. In addition, high frequencies from the low SNR data and the zero 
measurements are also included into the estimation. These data are known to contain detailed anatomical 
information of the complex tissue in vivo. 5) Our method leads to significantly less biased estimates of the 
noise level, which plays key role in denoising the MRI and cleaning the artefacts. 

Positive Constraints. The physical feature of diffusion requires the tensor to be positive definite. Our 
model allows to check the positivity of diffusivity in the tensor updates under the scheme of Fisher-scoring 
method. For the rank-2 tensor model, the constraining is fairly easy to do by computing the eigenvalues of 
the tensor matrix D. For FIARDI, Barmpoutis et al. Barmpoutis A. et al. <|2009[l propose the Gram matrix 
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approach, using the quartic form to guarantee the positivity. Other methods such as Qi L. et al. < 2010 1 
address the constraint by calculating the Z-eigenvalue polynomials. 

MLE VS MAPE. In this work, we did not list the results from MAPE but we emphasize the differences 
between these two methods. Bayesian methods have advantages in the learning process, meaning that they 
may gain extra information from the prior knowledge. When the prior is weak, like in our case, we learn 
things from the data, what we actually do when approaching the problem through frequentist statistical 
modeling. In order to learn the uncertainty of the diffusion parameters, a fully Bayesian approach is highly 
recommended to characterize the posterior parameter distributions rather than point estimation. 


6 Acknowledgement 

We thank Professor Antti Penttinen for reviewing the manuscript and providing insightful comments. We 
would also like to thank the Radiology Unit of Helsinki University Hospital for the data collection. This 
work was funded by Doctoral Program in Computing and Mathematical Sciences (COMAS), University of 
Jyvskyla. We acknowledge the Finnish Doctoral Programme in Stochastics and Statistics (FDPSS) provided 
travel funds for this research. 


Appendix 

A Theory of the EM algorithm 

Consider a statistical model (pe{y), 9 & 0), where 0 C W l , and the likelihood of the observed data y = 
(yi,... ,y n ) is expressed as the marginal of an integrated joint likelihood 


Pe(y) = J Pe{z,y)dz . 


Here z = (zi,..., z n ) € Z and z t are interpreted as latent variables. When Z is discrete, we replace integrals 


by sums. In the EM algorithm Dempster, A. P. et al. (19971, starting with an inital value 9^ € 0, we iterate 
the maximization step 

6 > (/c+1) = argmax|u e(fe )(logp e ( 2 :,y)| 2 /)| = axgmzx^J \ogp e (z, y)p e m (z|y)dzj , (A.l) 

where the integration is with respect to the conditional density 

/ i \ Pew{z,y) /ri r i ■, 

Pgm \z\y) = -rw (Bayes formula). 

Pewiy) 

By Jensen inequality, the Kullback relative entropy of the conditional distribution pe(z\y) related to pgw ( z\y ), 
given by 

is non-negative, which implies 

log pe{y) ~ \ogp eW {y) > 


v]= !P° g U<%) 


p e w{z\y)dz 


j z log (pe(z, y))pew {z\y)dx - J^ log (p gW (z, y))p e m ( z\y)dx 


(A.2) 
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and consequently 


\ogp e( k+i){y) > log p e w(y) , 


i.e. the EM-step does not decrease the marginal likelihood of y. It follows also from (A.21, that fixing a 
0 -subvector and maximizing with respect to the remaining 0-coordinates does not decrease the marginal 
likelihood of y. The EM algorithm is iterated until convergence to a fixed point 9^°°\ a local maximum of 
the marginal likelihood pg(y). When the local maximum is the global one, 9ml = 9 lryZ) is the maximum 
likelihood estimator of the parameter. The advantage of the EM algorithm is that, for some smart choices 


of the data augmentation z and the joint density pe(z, y), the maximization step (A.l I can be simpler than 
maximizing directly the marginal likelihood pg{y), especially in cases where the latter is hard to evaluate. 


B MLE by the EM algorithm in DTI 

Appendix |B] gives details of the expectation-maximization (EM) algorithm in DTE. We consider the Rician 
noise model with the Poissonian data augmentation of Section[2] The latent augmented variable N condi¬ 
tionally on A, Z is given by 


p t<a (N = n\X,Z) = 


1 exp(— 2t)t 2 


n € N, with t = X\j and x = Y 2 


/ 0 (2r) (n!) 2 

It follows Gasbarra D. et al. (20141 that this discrete distribution is referred as reinforced Poisson distribution 
with parameter r. 

In the EM algorithm we need to compute the conditional expectation of N conditionally on X and the 
design matrix Z. Given the current values i<*>, ( T 2 ^) 


{N) W ;= E tW ^ k) (AT| A, Z) = J2 n PtA N = n\X, Z) 

n—1 

= log„fi(l,(r<*>) 2 ) = r^/2-Yj log J„(2r<»,/=D = 


dr 

T (fc) / 1 (2 r ( fe )) 

/o(2r( fc )) 


J 0 (2r( fe )^T) 


with 


,(k) _ +rcf 2(fc) P (k) 2 (fe)_ Sg (fc) exp(2Z0( fc) ) 


and 


T (fe) = 


Vx t 


exp (zA k) )S 0 {k) - 


a 2{k) V2 

Note that oTi(l, r 2 ) = ,/o(2r\/—1) = / 0 (2r), where Jo(z) is the zero-order Bessel function of the first kind, 
Iq(z) is the zero-order modified Bessel function of first kind, which satisfies 


d v (t) — 

x 


and 


J-n(x) = (-1 ) n J n (x), I n (z ) = i n J n (zi). 

In the M-step, we maximize the parameters of the augmented log likelihood Q from Eq. | |2.4) w.r.t 
(0, a 2 , Sq). Omitting the items not depending on these parameters, Q can be expressed as 


i=1 


J2 ( lo g(-Sg) - 21og(a 2 ) + 2ZiO ) (Ni)W - mlog(a 2 ) - ex P( 2 ^) + X i) ■ 


(B.l) 


i= 1 
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It is easy to see in Eq. ( |B. 1| | that the log likelihood w.r.t a 2 and Sq are inverse Gamma and Gamma distribu¬ 
tions, respectively. Hence, we update these two parameters by their modes: 


<l 2 ml ■= argma x(Q) = 


and 


$o ml '■= argmax(Q) = 


E^i(^ + exp(2^)5 0 ) 

2E^(2W + 1) 
2^Er=iW 


s o E™l ex P( 2 -^ 0 ) 

To apply the Fisher scoring method, we have the score of 9 is 


S{9) = 2 5Zexp(2 Z t 9)Z tl 


i=l 


a ML i =1 


and the Fisher-information is given by 


J{6) = E 


d 2 Q 


d9 h dO k 


S 2 


0 ML ST 


<l 2 ml i =i 


exp(2 Z&ZiZ} 


(B.2) 


(B.3) 


(B.4) 


(B.5) 


C Maximization of Rician Log-likelihood 

Without data agumentation, we have to directly maximize the Rician log likelihood QRician, in short Q r 
thereafter, by using some typical MLE method, such as gradient descent. Then the first (the score) and 
second derivatives of Q r are usually required. The loglikelihood Q r is 

Q r = const. - mlog(cr 2 ) - 51 (y? + exp(2 Zid)S% \ + log J 0 f ^ cx p(^)\/^o 

2=1 x ' z=l x 

where I k (r) are modified Bessel functions of first kind satisfying 

I o( T ) = h(r), Iq{t) = i'i(t) = (I 0 (t) + / 2 (r))/2. 


The score of a 2 and Sq are respectively given by 

^ - ^ + ex P (2 ZA)S 2 ) - ^J29( Y ^MZ i e)So<J-AY l eMZ l 0)S o (C.l) 

2=1 ' ' 2=1 ' ' 

and 


dQ r 

dS 2 


-^ ^ exp(2 Zj9j) 


2=1 


2a 2 ^np YI9 ( Y i exp(Zi9)S 0 a~ 


Yi exp (ZiO). 


The score of 9 is given by 


dQr 
a Ok 


C 2 m 1 m / \ 

5I exp (2 Zfii)Z ik + ^^2g(Y i exp(Z i 9)S 0 a^jY i exp(Z i 9)SoZ ik . 

2=1 2=1 x ' 


(C.2) 


(C.3) 


14 














The Hessian of 9 is given by 

dQ 2 ^ 


dO h dO k 

+ 9 


9 S' 2 S f / \ 

= ^ ex P (2Zi0i)ZihZik + Y2 Y i ^>{Zi6)Z ik Z ih ^g exp (Zid)S 0 cr~ 2 J 


= V Z ih Z ik (-iti + n(ff(n) + ng'(n)'j = ^Z ih Z ih ^-4tf + t? - r; { J 

h(r) 


i=1 

where we denote 

Y i exp(Z. l 9)S 0 


Ti = 


2a 2 


9 (t) = — log I 0 (r) = , 

dr / 0 (r) 


• f x d 2 1 ^ h(T)\ ( Ii(r) 

s(T) = _i ogWT ) = ^i + _j_^_ 

with 


= 1 - 


h{r) (h{j) 


J o (r) \Io{t) 


h{j) = I 0 (t) - 


2/i(r) 


For SNR > 10, the corresponding Fisher-information matrix is approximated by 


I r (&) = E 

where (see Andersson J.L.Rj< 2008) ) 

E 


dQl 

de h de k 


/ Q 2 i 

E Zih,Z ik [ §1 exp(2Zi6) - \ 


hin) 


loin) 


2 / ’ 


% exp(2 ZiO)) + exp(2 Z z 9) - 1 
a z J a z 2 


(C.4) 


D Method Comparison 

In this section, we discuss the differences between our data-augmentation based on the EM algorithm and 
on the typical MLE method through direct maximization at Q r . 

1. We do not need to calculate all the elements of the Hessian as we can directly find the modes of Sq 
and a 2 by data augmentation. A small improvement appears in the reparametrization of Sq or log So 
by S 0 2 . 

2. In the E-step we compute 


(Ni) — £ l g( fc) ^ o . 2 (fc) ; 52 (»t) 


(D.l) 


which does not depend on the parameters 9, a 2 and Sq. In the M-step we use Eq. D.l the recursive 
values from 9^ k \ a 2<k> . S ( 2 instead of solving the intractable formula w.r.t those parameters. That 
dramatically reduces the computation of the score from Eq.(C.2 C.l C.3) to Eq.( B.3|lT2 B.41, respec¬ 
tively. 

3. The EM algorithm allows us to use empirical values from Eq.(|TXT} to compute the Fisher informa¬ 
tion. Our Fisher information J{9) which fits the whole range of SNR and is slightly bigger than the 
approximated one, I r (0), expressed in (Eq. Q), which requires heavy mathematical calculations 
to deal with different expectations (see Andersson J.L.R. < [2008[ ) for more details). In addition, when 
computing the score of 9 in Eq. 3.9 we do not need to update the items containing iV) as they are fixed 
values from Eq.fU.l [l. All those lead reduced computation in practice. 
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